Compressing Trigram Language Models With Golomb Coding
نویسندگان
چکیده
Trigram language models are compressed using a Golomb coding method inspired by the original Unix spell program. Compression methods trade off space, time and accuracy (loss). The proposed HashTBO method optimizes space at the expense of time and accuracy. Trigram language models are normally considered memory hogs, but with HashTBO, it is possible to squeeze a trigram language model into a few megabytes or less. HashTBO made it possible to ship a trigram contextual speller in Microsoft Office 2007.
منابع مشابه
N-Gram Language Model Compression Using Scalar Quantization and Incremental Coding
This paper describes a novel approach of compressing large trigram language models, which uses scalar quantization to compress log probabilities and back-off coefficients, and incremental coding to compress entry pointers. Experiments show that the new approach achieves roughly 2.5 times of compression ratio compared to the well-known tree-bucket format while keeps the perplexity and accessing ...
متن کاملGeneralized Golomb Codes and Adaptive Coding of Wavelet-Transformed Image Subbands
We describe a class of prefix-free codes for the nonnegative integers. We apply a family of codes in this class to the problem of runlength coding, specifically as part of an adaptive algorithm for compressing quantized subbands of wavelettransformed images. On test images, our adaptive coding algorithm is shown to give compression effectiveness comparable to the best performance achievable by ...
متن کاملCombining word prediction and r-ary Huffman coding for text entry
Two approaches to reducing effort in switch-based text entry for augmentative and alternative communication devices are word prediction and efficient coding schemes, such as Huffman. However, character distributions that inform the latter have never accounted for the use of the former. In this paper, we provide the first combination of Huffman codes and word prediction, using both trigram and l...
متن کاملCompressing Integers for Fast File Access
Fast access to files of integers is crucial for the efficient resolution of queries to databases. Integers are the basis of indexes used to resolve queries, for example, in large internet search systems and numeric data forms a large part of most databases. Disk access costs can be reduced by compression, if the cost of retrieving a compressed representation from disk and the CPU cost of decodi...
متن کاملEfficient Data Compression Technique Using Modified Adaptive Rice Golomb Coding for Wireless Sensor Network
Wireless sensor networks (WSN) are energy constrained network since each node in WSNs are typically powered by batteries with limited capacity. Compressing the data sensed at each sensor node in an energy efficient manner is necessary for extending the network lifetime of wireless sensor network. In each sensor node the communication module is the main energy consuming unit and therefore data c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007